Author: Vinícius Alves
Date: 03/04/23
Version: 1.1
Cyclistic is a bike-share company that are working in Chicago. It has more than 5800 bikes that are geotracked and locked in a network of 692 station across the city.
The company has some flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships.
In this case study we will analyze the data of users of a bike rental company. We want to know the difference between those users that have the subscription and those who use the service without commitment. After analyzig the data, ideas of how can we increase the number of annual memberships are desired.
The data that will be analyzed is from the past 12 months (march-2023).
The data used on this analysis is the past 12 months of users data from Cyclistic . The data has no personal information about the riders.
The data is stored in a AWS bucket and already available in Wide format.
After analyzing the data, we can we can say that the data is ROCCC (concept used by Google):
As the company collected the data, it is hard to say that it was modified. And about the location that is stored, AWS, we can assume that is secure about any undesired changes in the data.
There are some blank cells in the csv file. There are one archive that the name is not in the same pattern than the others. This will need some threatment.
# Libraries
import pandas as pd
import numpy as np
import statistics as st
import plotly.express as px
# Load and Concatenate data
archives = ['202202-divvy-tripdata.csv',
'202203-divvy-tripdata.csv',
'202204-divvy-tripdata.csv',
'202205-divvy-tripdata.csv',
'202206-divvy-tripdata.csv',
'202207-divvy-tripdata.csv',
'202208-divvy-tripdata.csv',
'202209-divvy-publictripdata.csv',
'202210-divvy-tripdata.csv',
'202211-divvy-tripdata.csv',
'202212-divvy-tripdata.csv',
'202301-divvy-tripdata.csv',
'202301-divvy-tripdata.csv']
data_year = pd.DataFrame()
for archive in archives:
archive_df = pd.read_csv(f"Data_Cyclistic/{archive}", skipinitialspace=True, index_col=0)
data_year = pd.concat([data_year, archive_df], axis=0)
data_year
| rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ride_id | ||||||||||||
| E1E065E7ED285C02 | classic_bike | 2022-02-19 18:08:41 | 2022-02-19 18:23:56 | State St & Randolph St | TA1305000029 | Clark St & Lincoln Ave | 13179 | 41.884621 | -87.627834 | 41.915689 | -87.634600 | member |
| 1602DCDC5B30FFE3 | classic_bike | 2022-02-20 17:41:30 | 2022-02-20 17:45:56 | Halsted St & Wrightwood Ave | TA1309000061 | Southport Ave & Wrightwood Ave | TA1307000113 | 41.929143 | -87.649077 | 41.928773 | -87.663913 | member |
| BE7DD2AF4B55C4AF | classic_bike | 2022-02-25 18:55:56 | 2022-02-25 19:09:34 | State St & Randolph St | TA1305000029 | Canal St & Adams St | 13011 | 41.884621 | -87.627834 | 41.879255 | -87.639904 | member |
| A1789BDF844412BE | classic_bike | 2022-02-14 11:57:03 | 2022-02-14 12:04:00 | Southport Ave & Waveland Ave | 13235 | Broadway & Sheridan Rd | 13323 | 41.948150 | -87.663940 | 41.952833 | -87.649993 | member |
| 07DE78092C62F7B3 | classic_bike | 2022-02-16 05:36:06 | 2022-02-16 05:39:00 | State St & Randolph St | TA1305000029 | Franklin St & Lake St | TA1307000111 | 41.884621 | -87.627834 | 41.885837 | -87.635500 | member |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| A303816F2E8A35A8 | electric_bike | 2023-01-11 17:46:23 | 2023-01-11 17:57:31 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902634 | -87.631591 | 41.920771 | -87.663712 | casual |
| BCDBB142CC610382 | classic_bike | 2023-01-30 15:08:10 | 2023-01-30 15:33:26 | Western Ave & Leland Ave | TA1307000140 | Clarendon Ave & Gordon Ter | 13379 | 41.966400 | -87.688704 | 41.957867 | -87.649505 | member |
| 7D1C7CA80517183B | classic_bike | 2023-01-06 19:34:50 | 2023-01-06 19:50:01 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902973 | -87.631280 | 41.920771 | -87.663712 | casual |
| 1A4EB636346DF527 | classic_bike | 2023-01-13 18:59:24 | 2023-01-13 19:14:44 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902973 | -87.631280 | 41.920771 | -87.663712 | casual |
| 069971675AC7DC62 | electric_bike | 2023-01-02 13:48:29 | 2023-01-02 13:59:29 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902822 | -87.631687 | 41.920771 | -87.663712 | casual |
5944549 rows × 12 columns
print(f'The columns in the data are {list(data_year.columns)}')
print("Those that deserve some attention is the Start and End time, Start and end Station name/id and member_casual classification")
The columns in the data are ['rideable_type', 'started_at', 'ended_at', 'start_station_name', 'start_station_id', 'end_station_name', 'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng', 'member_casual'] Those that deserve some attention is the Start and End time, Start and end Station name/id and member_casual classification
As already seen, I am choosing to use a Jupyter Noteebok coded in python
data_year.isna().sum()
rideable_type 0 started_at 0 ended_at 0 start_station_name 870246 start_station_id 870246 end_station_name 930495 end_station_id 930495 start_lat 0 start_lng 0 end_lat 6026 end_lng 6026 member_casual 0 dtype: int64
There is 241540 cells without information in start_station_name and start_station_id;
There is 264615 cells without information in end_station_name and end_station_id;
There is 1001 cells without information in end_lat and end_lng;
If I eliminate every cell without information, how many last?
new_data_year = data_year.dropna()
new_data_year
| rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ride_id | ||||||||||||
| E1E065E7ED285C02 | classic_bike | 2022-02-19 18:08:41 | 2022-02-19 18:23:56 | State St & Randolph St | TA1305000029 | Clark St & Lincoln Ave | 13179 | 41.884621 | -87.627834 | 41.915689 | -87.634600 | member |
| 1602DCDC5B30FFE3 | classic_bike | 2022-02-20 17:41:30 | 2022-02-20 17:45:56 | Halsted St & Wrightwood Ave | TA1309000061 | Southport Ave & Wrightwood Ave | TA1307000113 | 41.929143 | -87.649077 | 41.928773 | -87.663913 | member |
| BE7DD2AF4B55C4AF | classic_bike | 2022-02-25 18:55:56 | 2022-02-25 19:09:34 | State St & Randolph St | TA1305000029 | Canal St & Adams St | 13011 | 41.884621 | -87.627834 | 41.879255 | -87.639904 | member |
| A1789BDF844412BE | classic_bike | 2022-02-14 11:57:03 | 2022-02-14 12:04:00 | Southport Ave & Waveland Ave | 13235 | Broadway & Sheridan Rd | 13323 | 41.948150 | -87.663940 | 41.952833 | -87.649993 | member |
| 07DE78092C62F7B3 | classic_bike | 2022-02-16 05:36:06 | 2022-02-16 05:39:00 | State St & Randolph St | TA1305000029 | Franklin St & Lake St | TA1307000111 | 41.884621 | -87.627834 | 41.885837 | -87.635500 | member |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| A303816F2E8A35A8 | electric_bike | 2023-01-11 17:46:23 | 2023-01-11 17:57:31 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902634 | -87.631591 | 41.920771 | -87.663712 | casual |
| BCDBB142CC610382 | classic_bike | 2023-01-30 15:08:10 | 2023-01-30 15:33:26 | Western Ave & Leland Ave | TA1307000140 | Clarendon Ave & Gordon Ter | 13379 | 41.966400 | -87.688704 | 41.957867 | -87.649505 | member |
| 7D1C7CA80517183B | classic_bike | 2023-01-06 19:34:50 | 2023-01-06 19:50:01 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902973 | -87.631280 | 41.920771 | -87.663712 | casual |
| 1A4EB636346DF527 | classic_bike | 2023-01-13 18:59:24 | 2023-01-13 19:14:44 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902973 | -87.631280 | 41.920771 | -87.663712 | casual |
| 069971675AC7DC62 | electric_bike | 2023-01-02 13:48:29 | 2023-01-02 13:59:29 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902822 | -87.631687 | 41.920771 | -87.663712 | casual |
4585800 rows × 12 columns
If the cells without information are deleted, 4585800 rows lasts from 5944549 , 77% of the data.
Are there duplicate lines in the data?
new_data_year.duplicated().sum()
148305
Yes, there is 148305 lines that was duplicated.
Deleting Duplicates
new_data_year_no_duplicates = new_data_year.drop_duplicates(keep='first')
Checking if there are incosistents, like, start_at > ended_at
new_data_year_no_duplicates['started_at'] = pd.to_datetime(new_data_year_no_duplicates['started_at'])
new_data_year_no_duplicates['ended_at'] = pd.to_datetime(new_data_year_no_duplicates['ended_at'])
new_data_year_no_duplicates[new_data_year_no_duplicates['started_at']> new_data_year_no_duplicates['ended_at']]
C:\Users\Spuck\AppData\Local\Temp\ipykernel_12828\2262464060.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy new_data_year_no_duplicates['started_at'] = pd.to_datetime(new_data_year_no_duplicates['started_at']) C:\Users\Spuck\AppData\Local\Temp\ipykernel_12828\2262464060.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy new_data_year_no_duplicates['ended_at'] = pd.to_datetime(new_data_year_no_duplicates['ended_at'])
| rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ride_id | ||||||||||||
| 2D97E3C98E165D80 | classic_bike | 2022-03-05 11:00:57 | 2022-03-05 10:55:01 | DuSable Lake Shore Dr & Wellington Ave | TA1307000041 | DuSable Lake Shore Dr & Wellington Ave | TA1307000041 | 41.936688 | -87.636829 | 41.936688 | -87.636829 | casual |
| 7407049C5D89A13D | electric_bike | 2022-03-05 11:38:04 | 2022-03-05 11:37:57 | Sheffield Ave & Wellington Ave | TA1307000052 | Sheffield Ave & Wellington Ave | TA1307000052 | 41.936313 | -87.652522 | 41.936253 | -87.652662 | casual |
| 072E947E156D142D | electric_bike | 2022-06-07 19:14:46 | 2022-06-07 17:07:45 | W Armitage Ave & N Sheffield Ave | 20254.0 | W Armitage Ave & N Sheffield Ave | 20254.0 | 41.920000 | -87.650000 | 41.920000 | -87.650000 | casual |
| BF114472ABA0289C | electric_bike | 2022-06-07 19:14:47 | 2022-06-07 17:05:42 | Base - 2132 W Hubbard | Hubbard Bike-checking (LBS-WH-TEST) | W Armitage Ave & N Sheffield Ave | 20254.0 | 41.917831 | -87.653363 | 41.920000 | -87.650000 | member |
| 029D853B5C38426E | classic_bike | 2022-07-26 20:07:33 | 2022-07-26 19:59:34 | Lincoln Ave & Roscoe St* | chargingstx5 | Lincoln Ave & Roscoe St* | chargingstx5 | 41.943350 | -87.670668 | 41.943350 | -87.670668 | member |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2D98008FFB28C1B8 | electric_bike | 2022-11-06 01:56:17 | 2022-11-06 01:12:19 | Wabash Ave & Grand Ave | TA1307000117 | Wells St & Elm St | KA1504000135 | 41.891129 | -87.626821 | 41.903222 | -87.634324 | casual |
| 112ED5B9200BFD2A | classic_bike | 2022-11-06 01:46:10 | 2022-11-06 01:06:44 | Sheffield Ave & Webster Ave | TA1309000033 | Wells St & Institute Pl | 22001 | 41.921540 | -87.653818 | 41.897380 | -87.634420 | casual |
| 417746CBEB92A34E | classic_bike | 2022-11-06 01:46:17 | 2022-11-06 01:05:13 | Wells St & Hubbard St | TA1307000151 | Aberdeen St & Jackson Blvd | 13157 | 41.889906 | -87.634266 | 41.877726 | -87.654787 | member |
| B5602D5BB3D517F6 | electric_bike | 2022-11-06 01:59:05 | 2022-11-06 01:02:03 | Western Ave & Winnebago Ave | 13068 | California Ave & Milwaukee Ave | 13084 | 41.915592 | -87.687070 | 41.922695 | -87.697153 | member |
| 4139B11634039661 | classic_bike | 2022-11-06 01:58:46 | 2022-11-06 01:11:33 | Clark St & Grace St | TA1307000127 | Broadway & Berwyn Ave | 13109 | 41.950780 | -87.659172 | 41.978353 | -87.659753 | member |
69 rows × 12 columns
Checked that 69 rows have incosistent data
Deleting those inconsistencies
new_data_year_no_duplicates = new_data_year_no_duplicates[new_data_year_no_duplicates['started_at'] < new_data_year_no_duplicates['ended_at']]
Creating some metrics to help ahead
# ------------------------------------------------------------------------------------------
# ride length
new_data_year_no_duplicates['ride_length'] = new_data_year_no_duplicates['ended_at'] - new_data_year_no_duplicates['started_at']
new_data_year_no_duplicates['ride_length'] = new_data_year_no_duplicates['ride_length'].dt.total_seconds()
# days of the week
new_data_year_no_duplicates['day_of_week'] = new_data_year_no_duplicates['started_at'].dt.dayofweek + 1 # have to adapt, because here monday = 0
# Saving the data in csv
new_data_year_no_duplicates.to_csv("Data_Cyclistic/data_12months_no_duplicate.csv")
After the cleaning process, we have 4437183 rows with data.
I've decided to put every calculation together in a cell, so the output will be like a resume of what has been done. If you want to see the code used to that calculation, just go to the code.
# Loading the DataFrame
data_clean = pd.read_csv("Data_Cyclistic/data_12months_no_duplicate.csv")
data_clean
| ride_id | rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | ride_length | day_of_week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | E1E065E7ED285C02 | classic_bike | 2022-02-19 18:08:41 | 2022-02-19 18:23:56 | State St & Randolph St | TA1305000029 | Clark St & Lincoln Ave | 13179 | 41.884621 | -87.627834 | 41.915689 | -87.634600 | member | 915.0 | 6 |
| 1 | 1602DCDC5B30FFE3 | classic_bike | 2022-02-20 17:41:30 | 2022-02-20 17:45:56 | Halsted St & Wrightwood Ave | TA1309000061 | Southport Ave & Wrightwood Ave | TA1307000113 | 41.929143 | -87.649077 | 41.928773 | -87.663913 | member | 266.0 | 7 |
| 2 | BE7DD2AF4B55C4AF | classic_bike | 2022-02-25 18:55:56 | 2022-02-25 19:09:34 | State St & Randolph St | TA1305000029 | Canal St & Adams St | 13011 | 41.884621 | -87.627834 | 41.879255 | -87.639904 | member | 818.0 | 5 |
| 3 | A1789BDF844412BE | classic_bike | 2022-02-14 11:57:03 | 2022-02-14 12:04:00 | Southport Ave & Waveland Ave | 13235 | Broadway & Sheridan Rd | 13323 | 41.948150 | -87.663940 | 41.952833 | -87.649993 | member | 417.0 | 1 |
| 4 | 07DE78092C62F7B3 | classic_bike | 2022-02-16 05:36:06 | 2022-02-16 05:39:00 | State St & Randolph St | TA1305000029 | Franklin St & Lake St | TA1307000111 | 41.884621 | -87.627834 | 41.885837 | -87.635500 | member | 174.0 | 3 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4437178 | A303816F2E8A35A8 | electric_bike | 2023-01-11 17:46:23 | 2023-01-11 17:57:31 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902634 | -87.631591 | 41.920771 | -87.663712 | casual | 668.0 | 3 |
| 4437179 | BCDBB142CC610382 | classic_bike | 2023-01-30 15:08:10 | 2023-01-30 15:33:26 | Western Ave & Leland Ave | TA1307000140 | Clarendon Ave & Gordon Ter | 13379 | 41.966400 | -87.688704 | 41.957867 | -87.649505 | member | 1516.0 | 1 |
| 4437180 | 7D1C7CA80517183B | classic_bike | 2023-01-06 19:34:50 | 2023-01-06 19:50:01 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902973 | -87.631280 | 41.920771 | -87.663712 | casual | 911.0 | 5 |
| 4437181 | 1A4EB636346DF527 | classic_bike | 2023-01-13 18:59:24 | 2023-01-13 19:14:44 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902973 | -87.631280 | 41.920771 | -87.663712 | casual | 920.0 | 5 |
| 4437182 | 069971675AC7DC62 | electric_bike | 2023-01-02 13:48:29 | 2023-01-02 13:59:29 | Clark St & Elm St | TA1307000039 | Southport Ave & Clybourn Ave | TA1309000030 | 41.902822 | -87.631687 | 41.920771 | -87.663712 | casual | 660.0 | 1 |
4437183 rows × 15 columns
days = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
# Calculations
days = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
# Mean ride_length
mean_ride_length = np.average(data_clean['ride_length'])
print(f"Mean ride length {mean_ride_length} seconds")
# 1018.0223639638032 seconds
print("------------------------------------------------------")
# Max ride_length
max_ride_length = np.max(data_clean['ride_length'])
print(f"Max ride length {max_ride_length} seconds")
# 2061244.0000000002 seconds
print("------------------------------------------------------")
# Most used day of week
most_used_day = st.mode(np.array(data_clean['day_of_week']))
print(f"The most used day was: {days[st.mode(np.array(data_clean['day_of_week']))-1]}")
# 6 -> Friday
print("------------------------------------------------------")
# Average ride_length of members
mean_ride_length_members = np.average(data_clean[data_clean['member_casual']=='member']['ride_length'])
print(f"Mean ride length of members: {mean_ride_length_members} seconds")
# 743.92 seconds
print("------------------------------------------------------")
# Average ride_length of non members (casual)
mean_ride_length_casuals = np.average(data_clean[data_clean['member_casual']=='casual']['ride_length'])
print(f"Mean ride length of casuals: {mean_ride_length_casuals} seconds")
# 1429.11 seconds
# Non members may use more time because they do not use as a way to quickly move around places, they may use to walk around.
print("------------------------------------------------------")
# Most used day of week by members
most_used_day_members = st.mode(np.array(data_clean[data_clean['member_casual']=='member']['day_of_week']))
print(f"Most used day of week by members: {days[most_used_day_members-1]}")
# 2 -> Monday
print("------------------------------------------------------")
# Most used day of week by casuals
most_used_day_casuals = st.mode(np.array(data_clean[data_clean['member_casual']=='casual']['day_of_week']))
print(f"Most used day of week by casuals: {days[most_used_day_casuals-1]}")
# 6 -> Friday
print("------------------------------------------------------")
# Average ride_length by day of the week
ride_length_by_day = []
for i in range(len(days)):
ride_length_by_day.append(np.average(data_clean[data_clean['day_of_week'] == i + 1 ]['ride_length']))
print(f"Average time on {days[i]}: {np.average(data_clean[data_clean['day_of_week'] == i + 1 ]['ride_length'])}")
# in seconds
# Average time on sunday: 990.4536910566923
# Average time on monday: 887.3830605997467
# Average time on tuesday: 879.2172592014024
# Average time on wednesday: 913.2799763426103
# Average time on thursday: 975.6575417238967
# Average time on friday: 1231.9800748637176
# Average time on saturday: 1228.884441941905
print("------------------------------------------------------")
# Average ride_length by day of the week but only for members
ride_length_by_day_members = []
for i in range(len(days)):
ride_length_by_day_members.append(np.average(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='member') ]['ride_length']))
print(f"Average time of members on {days[i]}: {ride_length_by_day_members[i]}")
# Average time of members on sunday: 718.0785057330068
# Average time of members on monday: 703.4451079163837
# Average time of members on tuesday: 708.4531940595466
# Average time of members on wednesday: 719.0868667739798
# Average time of members on thursday: 731.1160278698136
# Average time of members on friday: 836.8331027650607
# Average time of members on saturday: 828.0943054331574
print("------------------------------------------------------")
# Average ride_length by day of the week but only for casuals
ride_length_by_day_casuals = []
for i in range(len(days)):
ride_length_by_day_casuals.append(np.average(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='casual') ]['ride_length']))
print(f"Average time of casuals on {days[i]}: {ride_length_by_day_casuals[i]}")
# Average time of casuals on sunday: 1478.9263399657832
# Average time of casuals on monday: 1277.5599405696905
# Average time of casuals on tuesday: 1228.4515566492441
# Average time of casuals on wednesday: 1267.0297971979421
# Average time of casuals on thursday: 1333.372923207027
# Average time of casuals on friday: 1597.9670304313597
# Average time of casuals on saturday: 1627.782605696265
print("------------------------------------------------------")
# Number of rides by day_of_week
rides_by_day = []
for i in range(len(days)):
rides_by_day.append(len(data_clean[data_clean['day_of_week'] == i + 1 ]))
print(f"Numbers of rides on {days[i]}: {rides_by_day[i]}")
# Numbers of rides on sunday: 595954
# Numbers of rides on monday: 623930
# Numbers of rides on tuesday: 628627
# Numbers of rides on wednesday: 654341
# Numbers of rides on thursday: 617392
# Numbers of rides on friday: 709556
# Numbers of rides on saturday: 607383
print("------------------------------------------------------")
# Number of rides by day_of_week but only members
rides_by_day_members = []
for i in range(len(days)):
rides_by_day_members.append(len(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='member') ]))
print(f"Numbers of rides of members on {days[i]}: {rides_by_day_members[i]}")
# Numbers of rides on sunday: 382609
# Numbers of rides on monday: 424032
# Numbers of rides on tuesday: 422190
# Numbers of rides on wednesday: 422440
# Numbers of rides on thursday: 366705
# Numbers of rides on friday: 341186
# Numbers of rides on saturday: 302973
print("------------------------------------------------------")
# Number of rides by day_of_week but only casuals
rides_by_day_casuals = []
for i in range(len(days)):
rides_by_day_casuals.append(len(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='casual') ]))
print(f"Numbers of rides of casuals on {days[i]}: {rides_by_day_casuals[i]}")
# Numbers of rides on sunday: 213345
# Numbers of rides on monday: 199898
# Numbers of rides on tuesday: 206437
# Numbers of rides on wednesday: 231901
# Numbers of rides on thursday: 250687
# Numbers of rides on friday: 368370
# Numbers of rides on saturday: 304410
Mean ride length 1018.0223639638032 seconds ------------------------------------------------------ Max ride length 2061244.0 seconds ------------------------------------------------------ The most used day was: friday ------------------------------------------------------ Mean ride length of members: 743.9176837388036 seconds ------------------------------------------------------ Mean ride length of casuals: 1429.1119023260217 seconds ------------------------------------------------------ Most used day of week by members: monday ------------------------------------------------------ Most used day of week by casuals: friday ------------------------------------------------------ Average time on sunday: 990.4536910566923 Average time on monday: 887.3830605997467 Average time on tuesday: 879.2172592014024 Average time on wednesday: 913.2799763426103 Average time on thursday: 975.6575417238967 Average time on friday: 1231.9800748637176 Average time on saturday: 1228.884441941905 ------------------------------------------------------ Average time of members on sunday: 718.0785057330068 Average time of members on monday: 703.4451079163837 Average time of members on tuesday: 708.4531940595466 Average time of members on wednesday: 719.0868667739798 Average time of members on thursday: 731.1160278698136 Average time of members on friday: 836.8331027650607 Average time of members on saturday: 828.0943054331574 ------------------------------------------------------ Average time of casuals on sunday: 1478.9263399657832 Average time of casuals on monday: 1277.5599405696905 Average time of casuals on tuesday: 1228.4515566492441 Average time of casuals on wednesday: 1267.0297971979421 Average time of casuals on thursday: 1333.372923207027 Average time of casuals on friday: 1597.9670304313597 Average time of casuals on saturday: 1627.782605696265 ------------------------------------------------------ Numbers of rides on sunday: 595954 Numbers of rides on monday: 623930 Numbers of rides on tuesday: 628627 Numbers of rides on wednesday: 654341 Numbers of rides on thursday: 617392 Numbers of rides on friday: 709556 Numbers of rides on saturday: 607383 ------------------------------------------------------ Numbers of rides of members on sunday: 382609 Numbers of rides of members on monday: 424032 Numbers of rides of members on tuesday: 422190 Numbers of rides of members on wednesday: 422440 Numbers of rides of members on thursday: 366705 Numbers of rides of members on friday: 341186 Numbers of rides of members on saturday: 302973 ------------------------------------------------------ Numbers of rides of casuals on sunday: 213345 Numbers of rides of casuals on monday: 199898 Numbers of rides of casuals on tuesday: 206437 Numbers of rides of casuals on wednesday: 231901 Numbers of rides of casuals on thursday: 250687 Numbers of rides of casuals on friday: 368370 Numbers of rides of casuals on saturday: 304410
Members have the higher amount of rides of the past year (60% of them), with a mean time of almost 12 minutes. Casual riders other way, have a mean time of almost 22 minutes, but why?
When we see the usage of the service by the day of week and by type of user, wee see that members use most in the midweek, while casual riders use in the weekend. So, with a lower time, and a midweek usage, we can assume that members use the service to travel between locations (maybe to work), while casual riders use it as a leisure time.
As we could see from the analysis calculations, Members tends to use Cyclistic bikes to help them to move in the midle of the city midweek, and casual riders use the service as a leisure time in the weekend. Each side sees the service in a different way.
I think that some graphics will help to visualize that difference.
Those that I do not see much important I hide.
# Pie Chart numbers of rides
n_members = len(data_clean[data_clean['member_casual']=='member'])
n_casuals = len(data_clean[data_clean['member_casual']=='casual'])
data = {'labels': ['members', 'casuals'],
'number of rides': [n_members, n_casuals]}
fig = px.pie(data, values='number of rides', names='labels', title='Number of rides last year')
fig.show()
# Bar chart: Average timeby type of user
data = {'labels': ['members', 'casuals'],
'Mean time': [mean_ride_length_members, mean_ride_length_casuals]}
fig = px.bar(data, x='labels', y='Mean time', title = 'Average time by type of user')
fig.update_layout(xaxis=dict(
title = 'Type of Users',
),
yaxis=dict(
title='Mean time (seconds)',
side='left'
)
)
fig.show()
# Bar chart: Rides By day of week
data = {'Days': days,
'Rides by day': rides_by_day}
fig = px.bar(data, x='Days', y='Rides by day', title = 'Rides By day of week')
fig.show()
# Bar chart: Rides By day of week (only members)
data = {'Days': days,
'Rides by day': rides_by_day_members}
fig = px.bar(data, x='Days', y='Rides by day', title = 'Rides By day of week (only members)')
fig.show()
# Bar chart: Rides By day of week (only casuals)
data = {'Days': days,
'Rides by day': rides_by_day_casuals}
fig = px.bar(data, x='Days', y='Rides by day', title = 'Rides By day of week (only casuals)')
fig.show()
# Bar chart: Mean time by day of week
data = {'Days': days,
'Mean time by day': ride_length_by_day}
fig = px.bar(data, x='Days', y='Mean time by day', title = 'Mean time by day of week')
fig.show()
# Bar chart: Mean time by day of week (only members)
data = {'Days': days,
'Mean time by day': ride_length_by_day_members}
fig = px.bar(data, x='Days', y='Mean time by day', title = 'Mean time by day of week (only members)')
fig.show()
# Bar chart: Mean time by day of week (only casuals)
data = {'Days': days,
'Mean time by day': ride_length_by_day_casuals}
fig = px.bar(data, x='Days', y='Mean time by day', title = 'Mean time by day of week (only casuals)')
fig.show()
Now, after analyzing the data and extracted a story from it, what could we do to increase anual memberships?
I've though in some ideas while analyzing and visualizing the data:
Marketing campaing promoting that the service may help in the midweek travel in the city;
Develop an app with circuits that can be completed by week (this may attract casual riders to sign the membership);
Marketing campaigns to make more activities (may attract more casual riders and improve members usage in the weekend)